In the period of 1991 to 2017, housing quality in New York has improved dramatically; however, some sectors of the housing stock continue to face poor conditions and some specific maintenance deficiencies continue to show higher prevalence. In this project, we develop an index that presents poor qualtity of housing in New York by measuring the physical deficiencies to show how the prevalence of these issues has shifted over time.
The index measures weighted sums of interactions between 22 variables that the authors chose. The selected variables were chosen if the authors agreed they described poor housing conditions. The index is not exhaustive, and potentially more data could be collected to better suit our purpose.
| Item | Description | NYCHVS Variable | Score |
|---|---|---|---|
| 1 | Exterior Walls: Missing brick, sliding or other | d1 | 2 |
| 2 | Exterior Walls: Sloping or bulgin walls | d2 | 2 |
| 3 | Exterior walls: Major Cracks | d3 | 2 |
| 4 | Exterior Walls: Loose or hanging corvice, roof, etc. | d4 | 2 |
| 5 | Interior Walls: Cracks or holes | 36a | 2 |
| 6 | Interior Walls: Broken plaster or peeling paint | 37a | 2 |
| 7 | Broken or missing windows | e1 | 5 |
| 8 | Rotten or loose windows | e2 | 2 |
| 9 | Boarded up windows | e3 | 3 |
| 10 | Sagging or sloping floors | g1 | 2 |
| 11 | Slanted/shifted doorsills or frames | g2 | 2 |
| 12 | Deep wear in floor causing depressions | g3 | 2 |
| 13 | Holes or missing flooring | g4 | 2 |
| 14 | Stairs: Loose, broken, or missing stair | f1 | 2 |
| 15 | Stairs: Loose, broken, or missing setps | f2 | 2 |
| 16 | No interior steps or stairways | f4 | 2 |
| 17 | No exterior steps or stairways | f5 | 2 |
| 18 | Number of heating equipment breakdowns | 32b | 2 per break down |
| 19 | Kitchen facilities fucntioning | 26c | 3 if no, 5 if no kitchen facilities |
| 20 | Toilet Breakdowns | 25c | 3 if any, 5 if no toliet or plumbing |
| 21 | Presence of mice or rats | 35a | 3 |
| 22 | Water Leakage | 38a | 3 |
Figure 1 shows the poor quality index scores for the 156,230 occupied units in the New York Housing Dataset from 1991 to 2017. The frequency distribution is skewed to the right. Overall, fourty five percent of the units were scored 0. The highest score was in 1993 with 54 points. 2008 had the highest percent(64%) of units that has 0 poor quality scores.
Figure 2 shows percent the percent of ccupied units with poor quality scores. Over the period of 1991 to 2017, most of the units has poor quality scores between 1 and 10 points; very little units that has the poor quality scroes over 20 points.
Figure 3 tracks trends in poor quality index scores during the period of 1991 to 2017. We decided to report the means, medians, 75th percentiles, 95th percentiles, and 99th percentiles. In most of the years, the median had the poor quality scores of 0. The mean ranged from 4.0 in 1991 to 2.5 in 2017. The 99th percentiles clearly show the improvement of housing in New York( from 25 poor quality points in 1991 to 18 porr quality points in 2017)
Figure 4 shows the poor condition of housing in five different boroughs in New York city in the period of 1991 to 2017. Overall, all five boroughs had an improvement of the house quality. Bronx had the worse housing condition and Stalen Island had the best housing condition.
## OGR data source with driver: GeoJSON
## Source: "/Users/thienngole/Desktop/MSU/10-MSU-Spring-2019/MTH390Q-DataScience/project/NY-Housing-Data/Community Districts.geojson", layer: "OGRGeoJSON"
## with 71 features
## It has 3 fields
Ultimately, we did not arrive at a method to test our index. However the authors believe the index should be validated against a variable indicative of quality, but not measured in the index. It is in future plans to find data to perfom such a validation test. Potential variables were omitted due to the fact they only had data for recent years. Whether a unit has functioning air conditioning was only measured during the years 2014 and 2017. In measuring housing quality this variable would have been useful. The authors’ have chosen not to include such datas it may inflate the index scores for later years. However, there are plans to create strong indexes for the recent years.
In this paper we have created a housing quality index that measures poor housing conditions. We remark that housing conditions have been slowly improving over time, particularly among units with high index values. Our goal was to measure hosuing quality and our proposed index specificaly measures poor housing conditions rather than just quality. We believe it would be benefecial to creat several indexes concering qualilty of hosuing e.g., High Quality Index, Neighborhood Quality Index and to consider all such indexes when considering hosuing quality. We also reccomend further exploration of the spatial component of the data to see if things such as crime, location, and the index value related.
## [1] 152313 35
##
## Call:
## lm(formula = pqi ~ hhinc + X_30a + X_31b, data = data_part2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.400 -3.736 -1.544 1.989 49.613
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.766e+00 2.952e-02 161.446 <2e-16 ***
## hhinc 9.114e-08 2.491e-07 0.366 0.715
## X_30a 6.945e-04 7.086e-05 9.800 <2e-16 ***
## X_31b -1.762e-03 6.938e-05 -25.390 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.083 on 77284 degrees of freedom
## (75025 observations deleted due to missingness)
## Multiple R-squared: 0.03649, Adjusted R-squared: 0.03646
## F-statistic: 975.7 on 3 and 77284 DF, p-value: < 2.2e-16
##
## Call:
## glm(formula = X_25c ~ hhinc + X_30a + X_31b, data = data_part2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.2536 -0.1270 -0.1208 -0.1094 1.0239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.352e-01 1.980e-03 68.278 < 2e-16 ***
## hhinc -3.651e-08 1.697e-08 -2.152 0.0314 *
## X_30a 1.873e-05 4.731e-06 3.959 7.53e-05 ***
## X_31b -3.328e-05 4.637e-06 -7.176 7.26e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 0.1058602)
##
## Null deviance: 7439.9 on 70105 degrees of freedom
## Residual deviance: 7421.0 on 70102 degrees of freedom
## (82207 observations deleted due to missingness)
## AIC: 41526
##
## Number of Fisher Scoring iterations: 2
1=“Bronx”, 2=“Brooklyn”, 3=“Manhattan”, 4=“Queens”, 5=“Staten Island”
## Predicted
## Actual 1 2 3 4 5
## 1 5952 2605 1483 993 104
## 2 1762 12410 2434 1663 152
## 3 1266 2625 11872 1708 145
## 4 917 1989 1646 7892 167
## 5 138 324 309 504 508
Correct classification rate
## [1] 0.2536487